TODO

  • Exemplarische Paper für die jeweiligen simulierten Datensätze raussuchen
  • Weitere synthetische Daten siehe Paper Robinson et al. (2021)
  • Evtl. simulierte Datensätze um Beispiele aus https://mjskay.github.io/ggdist/articles/lineribbon.html erweitern (insbesondere das erste Beispiel mit x-versetzten Gausskurven)!
  • Simulierte Daten durch Funktionen ersetzen? Eigentlich aktuell eher unnötig, da die Zeitreihen ja eigentlich bereits mithilfe von Funktionen modelliert wurden. Zudem wird nicht interpoliert, abgeleitet o. Ä. … letztlich für die aktuelle Idee des Papers eher ungeeignet (plus man umgeht Probleme wie mögliche Glättungsartefakte, oder die Definition eines functional mean).
  • Falls doch FDA, dann evtl. als eigenständiges Skript forken
  • Begründung warum Prädiktionsintervalle anstelle von Konfidenzintervallen
  • .bib file anlegen
  • Begriff spatiotemporal überdenken (wie ist das im Kontext FDA definiert)
  • Ratkowsky besorgen
  • Begründung warum Prädiktionsintervalle und nicht Konfidenzintervalle (theoretisch könnte man beides machen; Prädiktionsintervalle scheinen mir aber eher das zu sein, was man will)

TODO read (again)

  • Robinson, Vanrenterghem, Pataky (2021) Sample size estimation for biomechanical waveforms: Current practice, recommendations and a comparison to discrete power analysis
  • Olshen et al. (1989) Gait analysis and the bootstrap
  • Curvewise point and interval summaries for tidy data frames of draws from distributions https://mjskay.github.io/ggdist/reference/curve_interval.html
  • Horvath, Kokoszka & Reeder (2013) Estimation of the mean of functional time series and a two-sample problem
  • Generell die Kooperationspaper von Hörmann und Kokoszka zu FDA Statistik,insbesondere ‘Inference for Functional Data with Applications’ (2012)
  • Ratkowsky 1990 Handbook of nonlinear regression models

Introduction

Hier ein anderes Intro schreiben … Ansatz: Neben der Validierung von Messgeräten spielt auch die Validierung von Modellergebnissen eine Rolle!

Therefore, when analyzing the validity of curve data, the entire length of the curve should be accounted for. This can be accomplished e. g. by applying statistics separately to all points of the curve [Pini et al. (2019). The pointwise approach allows for an error representation over the entire domain, but ignores the fact, that the sampling points within a curve are locally correlated (Deluzio & Astephen, 2007; Pataky, 2010). From a statistical point of view, this implies that the random error component is likely underrepresented when the (nonzero) covariance term of adjacent sampling points is not accounted for (Lenhoff et al., 1999).

From a statistical point of view, this implies that the random error component is likely underrepresented when the (nonzero) covariance term of adjacent sampling points is not accounted for (Lenhoff et al., 1999). In addition, most traditional point statistics are either not suitable (e.g. statistical tests of mean differences), or are only suitable to a limited extent (e.g. product-moment correlation) to quantify the agreement between two measurement systems (Bland and Altman, 1986).

One possibility to address the problem of pointwise analysis is to treat entire curves as (functional) objects rather than as a series of unconnected points.

Sutherland et al. (1988) introduced a method in which continuous signals are reconstructed using Fourier series and bootstrapped to estimate of the variation of the sample mean and thus estimate confidence intervals. Lenhoff et al. (1999) illustrated their method on a generic gait analysis data set (comprised of joint angle curves) and compared it to pointwise Gaussian intervals. The authors found that Sutherland et al. bootstrap bands provide appropriate coverage for continuous curve gait data (86% coverage for a targeted coverage of 90%) while pointwise Gaussian bands are shown to provide inadequate coverage (54% for a targeted coverage of 90%).

Røislien et al. (2012) presented a similar approach to calculate a functional version of the the Limits of Agreement (LoA) method suggested by Bland & Altman. They first calculate functional difference curves which are then bootstrapped to obtain 95% confidence intervals. The approach accounts for the autocorrelation characteristic in the signal and at the same time allows for an intuitive description of the systematic and random error terms in the original sampling space (e.g. actual degrees). Røislien et al., however, rely on a number of strong model assumptions, which are often not or only partially fulfilled in empirical data sets:

  1. The error distribution is approximately normal.

The construction of symmetric prediction bands as described in Røislien et al. [10] relies on Gaussian assumptions. Non-normality, however, is often present when e.g. sensor values are truncated because of limited range or resolution, or because values drift over time.

  1. Time-continuous differences of two measurement systems and the corresponding averages must be uncorrelated.

This assumption is violated considerably in many validation data sets [12]. E.g., Atallah et al. [13] found that the agreement between an ear-worn accelerometer and a force-plate instrumented treadmill was associated with gait speed when estimating gait cycle duration. When validating a foot-worn gyroscope for treadmill running, Koska & Maiwald [14] observed a linear error trend that was associated with foot strike behavior.

  1. Aus einem Datensatz bestehend aus einer Kurve eines Probanden lassen sich durch Bootstrapping von mittleren Kurven Unsicherheitsintervalle bilden, die ein geeignetes Maß für die estimation uncertainty darstellen.

Hier werden arithmetische mean curves verwendet, the variability of which will reflect variation between means. In many cases, however, it is the variability of individual values that is relevant, and the limits will be too narrow by a factor of approximately the square root of the number of curves within the sample.

Ein weiteres Problem ist, dass die Methode keine messwiederholten Daten berücksichtigt. Da sich messwiederholte Kurven i. d. R. ähnlicher sind als unabhängige Kurven, besteht die Gefahr die Streuung des Messsystems zu überschätzen.

The principal object of this paper was to demonstrate a method for constructing continuous prediction intervals from curve data that addresses the aforementioned issues (analyzing the whole curve, including repeated measures, model assumptions (normal distribution, equal variances)), and thus enables a more appropriate characterization of the agreement between measurement curves on the basis of easy to interpret error bands (i.e. LoA in the actual measurement unit).

We use synthetic time series data sets containing typical curves of biomechanical measurement systems to illustrate the method and compare the resulting prediction bands to pointwise Bland & Altman parameters and LoA constructed similar to the method described in Roislien et al. (2012). For model validation, the respective LoA are cross validated using a leave-one-out approach.

Methods

Limits of agreement methods

FLoA derived by different methods are compared

  • Randomized Cluster Bootstrap (FLoARCB)
    • n = length(subjects) random strides from all strides (FLoARCB_v1)
    • One random stride per subject (FLoARCB_v2)
    • Fetch a SINGLE random stride from all strides (FLoARCB_v3)
  • Pointwise Gaussian intervals (FLoAPoint)
    • FLoAPoint are calculated using mean and standard deviations (derived from linear mixed effects models)
  • Roislien-like intervals (FLoARoslien)
    • (Get one random stride from each subject ONCE and boot- strap the resulting sample (of length (n=length(subjects))

FLoARCB construction

Validation

Synthetic data sets

To compare different methods for constructing continuous LoA, several sets of smooth curve data were simulated across a range of common biomechanical signals and error characteristics. The quality of the curves is similar, as the paper is primarily developed for a validation context in which the same movement process is analyzed using similar methods (two measurement devices or models). For the same reason, the curves do not necessarily correspond to real-world signals as we were more concerned with illustrating the respective methods and error characterstics.

All curves were modeled as Fourier series from additively superimposed sine functions. Each data set contains n.strides = 10 curves from n.subj = 11 fictitious test subjects. Each curve consists of t = 101 data points.

n.subj <- 11
n.ts <- 100

t <- seq(0, 100)

for (subject.idx in 1:n.subj) {

  # Subjectwise parameters
  offset.mean <- runif(1, min = -0.5, max = 0.5)

  a.sd <- runif(1, min = 0.05, max = 0.15)
  b.sd <- runif(1, min = 0.0001, max = 0.002)

  for (stride.idx in 1:(n.strides)) {

    a1.device1 <- rnorm(1, mean = 3, sd = a.sd)
    a1.device2 <- rnorm(1, mean = 3, sd = a.sd)
    a2.device1 <- 0.08
    a2.device2 <- 0.08
    b1.device1 <- rnorm(1, mean = 0.06, sd = b.sd)
    b1.device2 <- rnorm(1, mean = 0.06, sd = b.sd)
    b2.device1 <- rnorm(1, mean = 0.58, sd = b.sd)
    b2.device2 <- rnorm(1, mean = 0.58, sd = b.sd)
    c <- 2

    sine.1 <- a1.device1 * sin(b1.device1 * t) ^ (c + 3)
    sine.2 <- a2.device1 * sin(b2.device1 * t)
    sine.3 <- a1.device2 * sin(b1.device2 * t) ^ (c + 3)
    sine.4 <- a2.device2 * sin(b2.device1 * t)

    offset <- rnorm(1, offset.mean, 0.05)
    
    # Model (i). (ii), or (iii)
    # ...
    1. Homoscedastic (normal) error. Within and between subject differences between curves were modeled using random systematic offset (bias) and random variation of curve parameters. The data are representative of e. g. joint angle measurements simultaneously recorded from two kinematic measurement systems (e. g. a camera-based measurement systems and a goniometer).
# ...

# Model (i)
device.1 <- sine.1 + sine.2
device.2 <- offset + sine.3 + sine.4
    1. Trend error: Same model as in (i), but with a nonlinear trend underlying one of the two signals. Similar errors can be observed e. g. when joint angles are calculated (numerically integrated) from gyroscope signals.
# ...
trend <- (1 / 100000) * seq(0.5, 50.5, 0.5)^3

# Model (ii)
device.1 <- sine.1 + sine.2
device.1 <- offset + sine.3 + sine.4 + trend
    1. Heteroscedastic (non-gaussian, Weibull distributed) error: Heteroscedasticity across the signal length. The example consists of knee moment curves calculated using two different methods (direct and inverse kinematic approach) (Robinson et al., 2014). The curves were recreated using the example code provided in Robinson et al. (2021).
# ...
a1.device2 <- rnorm(n = 1,
                    mean = rweibull(1, shape = 1.5, scale=1) - factorial(1/1.5), # factorial() used to center around 0
                    sd = a.sd)
# ...
sine.3 <- a1.device2 * sin(b1.device2 * t) ^ (c + 3)
# ...

# Model (iii)
device.1 <- sine.1 + sine.2
device.2 <- offset + sine.3 + sine.4
    1. Data with (missing) peaks. Same model as in (i), except a couple of frames where a short, prominent peak occurs in one of the two signals. This typically happens in high-frequency signals such as accelerations or ground reaction forces, where peaks are underrepresented or entirely missing when e. g. sampling rates are too low or lowpass filter artefacts occur. Also, modelling these kind of abrupt changes in the signal (when e. g. to estimate ground reaction forces from acceleration values) is more difficult and my result in similar errors (Alcantara, 2021).

Smooth curves (homoscedasticity, normal error, no trend)

Smooth curves with nonlinear error trend

Heteroscedasticity across the signal length

Smooth curves with (missing) peaks

Curves phase shifted in x-axis direction

Cross validation

Leave-one out method to estimate the achieved coverage

Prediction intervals are calculated from size.dataset = number.subjects - 1 and then validated against the left out subjects curves for five different methods:

  • FLoARCB_v1 : n = length(subjects) random strides from all strides

  • FLoARCB_v2 : One random stride per subject

  • FLoARCB_v3 : Fetch a SINGLE random stride from all strides

  • FLoARoislien : Roislien approach (Get one random stride from each subject ONCE and boot- strap the resulting sample (of length (n=length(subjects))

  • FLoAPoint : Pointwise B & A Limits of Agreement

Results

Functional limits of agreement

Leave-one-out cross validation

Displayed are boxes for five different methods:

    1. FLoARCB_v1 : n = length(subjects) random strides from all strides
    1. FLoARCB_v2 : One random stride per subject
    1. FLoARCB_v3 : Fetch a SINGLE random stride from all strides
    1. FLoARoislien : Roislien approach (Get one random stride from each subject ONCE and boot- strap the resulting sample (of length (n=length(subjects))
    1. FLoAPoint : Pointwise B & A Limits of Agreement

Coverages are displayed for two different scenarios:

  • a: Percentage of entirely covered curves (all points within the limits): Each box consists of n~ = 11 values (one percentage for each subject).

  • b: Mean percentage of curve points within the respective FLoA: The fraction of points along a curve within the limits is calculated and averaged across all curves of a subject.

Smooth, wave data (normal error, constant variance, no trend)

  1. (coverage of the entire curve)
lwr.bnd <- floa["lower", ]
upr.bnd <- floa["upper", ]

# ...

for (stride.idx in stride.indices){
    curve <- subset(device.diff, strideID == stride.idx)

    # Compare difference curves with upper and lower boundaries
    below.thresh <- curve$value < lwr.bnd
    above.thresh <- curve$value > upr.bnd

    points.outside <- sum(above.thresh) + sum(below.thresh)

    # Check if the entire curve is within the limits
    if (points.outside > 0) {
      outside <- outside + 1
    }
  }

coverage <- 1 - (outside / n.strides)

  1. (mean percentage of covered curve points)
lwr.bnd <- floa["lower", ]
upr.bnd <- floa["upper", ]

# ...

for (stride.idx in stride.indices){

  curve <- subset(device.diff, strideID == stride.idx)

  # Compare difference curves with upper and lower boundaries
  below.thresh <- curve$value < lwr.bnd
  above.thresh <- curve$value > upr.bnd

  # Mean percentage of points outside the limits
  inside <- below.thresh == above.thresh
  inside.thresh.perc <- c(inside.thresh.perc,
                          mean(sum(inside[TRUE]) / length(curve$value)))
}

Curves with nonlinear trend (constant variance)

  1. (coverage of the entire curve)

  1. (mean percentage of covered curve points)

Curves with non-gaussian (Weibull distributed) error (no trend)

  1. (coverage of the entire curve)

  1. (mean percentage of covered curve points)

Curves with (missing) peaks

  1. (coverage of the entire curve)

  1. (mean percentage of covered curve points)

Curves phase shifted in x-axis direction

  1. (coverage of the entire curve)

  1. (mean percentage of covered curve points)

Discussion

It might be beneficial to work with functional data as suggested by Lenhoff et al. (1999) or Roislien et al. (2012) when e. g. interpolating data to a standardized length. This approach, however, wasn’t necessary in this paper, since the synthetic data were created from functions and were designed to all have equal length (101 data points).

Wir empfehlen die Daten zu plotten. Am Ende ist jeder Datensatz bzw. jede Fragestellung unterschiedlich und ein pauschaler coverage value wird der Qualität des Messgeräts (in Bezug auf die jeweilige Fragestellung möglicherweise nicht gerecht).

Conclusion